The data science technology the International Consortium of Investigative Journalists (ICIJ) used to explore the papers appears to be powered by Linkurious (which appears to be a graph visualization and analysis software) . The ICIJ website did showcase some of the network visualization on its webpage.
Hence, right at the start, My aim was to explore both the panama-paradise papers with network packages available in R. This notebook should serve as a exploratory of panama-paradise papers, as well as documentating/introducing network package in R using (igraph)[http://kateto.net/networks-r-igraph] as a core. Also, given the amount of nodes and Edges, I choose to explore only interactive network plots.
run require R packages.
#general
require(purrr)
require(tidyverse)
require(data.table)
require(lubridate)
require(stringr)
require(ggvis)
require(ggplot2)
require(forcats)
require(ggmap)
require(highcharter)
require(broom)
require(plotly)
require(stringi)
#network plot
require(igraph)
require(ggmap)
require(sna)
require(intergraph)
require(ggnetwork)
require('visNetwork')
require(viridis)
# achieve/appendices
require(GGally)
require(networkD3)
csv files to be read.
Entities <- as.data.table(read.csv(file="../input/Entities.csv",na.strings=c("","NA"),stringsAsFactors = FALSE))
Addresses <- as.data.table(read.csv(file="../input/Addresses.csv",na.strings=c("","NA"),stringsAsFactors = FALSE))
Intermediaries <- as.data.table(read.csv(file="../input/Intermediaries.csv",na.strings=c("","NA"),stringsAsFactors = FALSE))
Officers <- as.data.table(read.csv(file="../input/Officers.csv",na.strings=c("","NA"),stringsAsFactors = FALSE))
Edges <- as.data.table(read.csv(file="../input/all_edges.csv",na.strings=c("","NA"), stringsAsFactors = FALSE))
data_list<- c("incorporation_date","inactivation_date","struck_off_date","dorm_date")
Entities[,(data_list):=lapply(.SD,parse_date_time,orders="%d-%m-%Y"),
.SDcols=data_list]
*Links/Edges - A data frame object with the links between the nodes. It should include the Source (from) and Target (to) for each link, as well as other properties (depending on package being use). Sadly, the provided datasets contain no flow of finance/money, just relationship between identities.
*Nodes/Vertices - A data frame containing the node id and properties of the nodes. (Address, Enties, Intermediaries, Officers in this case)
The .csv given are also rather well structured to be adapted into network packages - Entities, Addresses, Intermediaries, and Officers datasets llisting are all attributed to a “node_id”, while the “node_1” and “node_2” column in Edge.csv seems to be describing the relationship between said nodes.
The very first thing I realise is that it is impossible to relistically plot ~1.5million edges and ~900k nodes in a single plot. Even if the hardware/software could somehow support it, it would not be comprehensible to human. This issue should be clearer after seeing some networks plots later.
Thus, given that one can only shown limited edges and nodes in a network plots. It seems reducing the respective node_id to respective unique instance of countries would work (both nodes and edges would require processing).
Alternatively, one could cluster the nodes, then proceed to subplot the network. This is done in later section, although this method did not occur to me until I am done with country based nodes and started examining the clusters.
Binded all the identities in the nodes (Entities, Intermediaries, Officers). I excluded address data sets from this nodes as all the identities are already tied to their respective geographical countries.
## Nodes
# Combining various identities and label them
Nodes<-rbind(Entities[,.(node_id,countries, country_codes, "Entities")],
Intermediaries[,.(node_id,countries, country_codes, "Intermediaries")],
Officers[,.(node_id,countries, country_codes, "Officers")])
setnames(Nodes, "V4", "Identity") # data.table method to rename
# colnames(Nodes)[4]<- "Identity"
While there is already some nodes depicting of the country names and codes as “Not Identify”, some of the countries column in the node are left emtpy/NA.
Instance where,
country=NA - is changed to “Unknown” for mapping and aggregating purpose
country=“Not Identify” - is left alone
Nodes<-Nodes[is.na(countries), ':='(countries= "Unknown", country_codes = "XXX")]
Creating a node_id map to country id.
## Records listed for single Country
IndividualCountry_Nodes<-Nodes[!grep(";",countries)] %>% # for single country listing
# creating id column that is unique to per country
.[,id:=.GRP, by= countries]
# Creating a unique Mapping of Country to ID
Country2ID_Map<-IndividualCountry_Nodes[,.(id,countries)]%>%
unique(., by = c("countries","id"))
#Number of countries
IndividualCountry.Agg<-Nodes[!grep(";",countries),] %>%
.[,.N,by=c("countries", "country_codes", "Identity")] %>%
.[order(-N)] %>%
.[, if(sum(N)> 5000) .SD, by=c("countries")] # filtering for only countries with more than 5k listings
# plot
hchart(IndividualCountry.Agg, "column", hcaes(x = countries, y = N, group = Identity))
The country column for those with muliple country name listed seperate instance of a country. For example,
“British Virgin Islands;Hong Kong”,
“Hong Kong;British Virgin Islands”
This would introduce an extra 569 nodes ( due to the combinatin of countries), hence should be eliminated/merged away, not to metion these listing also begs the questions of where to list them on a physical world map.
# For records listing multiple countries, most of them are Entities.
# data.frame(table(IndividualCountry_Nodes$Identity))
## CrossCountry Nodes, which listed multiple countries seperated with ";",
CrossCountry_Nodes<-Nodes[grep(";",countries)]
# For records listing multiple countries, most of them are Entities.
# data.frame(table(CrossCountry_Nodes$Identity))
#Number of countries
CrossCountry_Nodes.Agg<-CrossCountry_Nodes %>%
.[,.N,by=c("countries", "country_codes", "Identity")] %>%
.[order(-N)]%>%
head(30)
# plot
hchart(CrossCountry_Nodes.Agg, "column", hcaes(x = countries, y = N, group = Identity))
# At first I thought about ignoring these, but then, these might hold valueble information regarding links, given the links between one entities/intermediates and another.
This should be combined for the count. Reodering of the list country terms after a strsplit to counteract this.
## "British Virgin Islands;Hong Kong" is listed as seperated count as
# "Hong Kong;British Virgin Islands", Hence, need to combine them in same counts
### This section split the strings in countries column, which is then reordered and combined back into its original column
## helper function for vapply()
striHelper <- function(x) stri_c(x[stri_order(x)], collapse = ";")
CrossCountry_Nodes$countries<-vapply(strsplit(CrossCountry_Nodes$countries, ";"), striHelper, ";")
CrossCountry_Nodes$country_codes<-vapply(strsplit(CrossCountry_Nodes$country_codes, ";"), striHelper, ";")
# Raw Number Aggregation, can also be use to check the reordering of strings
CrossCountryOccurance<-CrossCountry_Nodes %>%
.[,.N, by = c("countries", "country_codes")] %>%
.[order(-N)]
I decided on taking only the 1st column after reodering to simplify the tracking of countries, this would introduce some bias into the data due to the ordering of countries names by alphabetical order.
In essence, this would convert
“British Virgin Islands;Hong Kong” ->“British Virgin Islands”
“Hong Kong;British Virgin Islands” ->“British Virgin Islands”
# #Spliting the multiple countries listed
t.splits <- max(lengths(strsplit(CrossCountry_Nodes[,countries], ";")))
#
# t.test <- CrossCountry_Nodes[,.(countries,country_codes)] %>%
# .[, paste0("m.countries",1:t.splits):=tstrsplit(countries,";")] %>%
# melt(., measure.vars = patterns("^m.*"), na.rm = T) %>%
# .[,.N, by=c("value","countries")] %>%
# .[order(-N)] %>%
# .[, if(sum(N)> 500) .SD, by=c("countries")] # filtering for only countries with more than 500 listings
#
# # plot
# hchart(t.test, "column", hcaes(x = value, y = N, group = countries))
CrossCountry_Nodes<-CrossCountry_Nodes%>%
.[, paste0("m.countries",1:t.splits):=tstrsplit(countries,";")] %>%
# this would merge the country uniqiue id on "m.countries1" column, hence introduce slight bias into the data
# Perhaps a double merge approach might be better? such that both of the listed countries are each melted into a entry
# It would be messy though.
.[Country2ID_Map, on=c(m.countries1 = "countries"), nomatch= 0]
Combining both the indiviual and cross listing country’s node_id to country_id
## And Thus we finnaly have our node_id to country id ready
Bind_Country2ID_Map<-rbindlist(
list(
CrossCountry_Nodes[,.(node_id, m.countries1, Identity, id)],
IndividualCountry_Nodes[,.(node_id, countries, Identity, id)]
)
)
Pulling only “node_id” from Egdes as attributes
##Edges
Edges_simplified<-Edges[,.(node_1, node_2)]
# Edges_simplified[complete.cases(Edges_simplified)]
Steps taken:
Applying the previously constructed Country2ID Map
Aggregate the relationship/connection between countries, summing each incidence (set as weight).[Simple graph]
#merging data table, edges and nodes
Country_id_Edges<-Edges_simplified %>%
.[Bind_Country2ID_Map, on=c(node_1 = "node_id"), nomatch= 0] %>%
.[Bind_Country2ID_Map, on=c(node_2 = "node_id"), nomatch= 0] %>%
.[,.(id,i.id)]%>% #the "ID" is derived from country ID from node_1, the second - "I.ID" is derived from node_2
.[, .N, by=c("id","i.id")]
colnames(Country_id_Edges)<- c("from", "to", "weight")
Pin pointing the respective country nodes on map with geocoding
# with ggmap version 2.6 and geocoding withing a key, it is possible for one to ran into OVER QUERY LIMIT with just a couple geocode (as the quote for query is shared).
# Hence, to get it working perfectly, currenctly, one has to install ggmap v2.7 ( through github only atm), and register a google key
# devtools::install_github("dkahle/ggmap")
# install.packages("geosphere")
## To get a API key from google API
# https://developers.google.com/maps/documentation/geocoding/get-api-key
# https://stackoverflow.com/questions/36175529/getting-over-query-limit-after-one-request-with-geocode
register_google(key = "AIzaSyChW6mLIfjq1NlCd1nxg_A6z1jgtTdVmek")
filelist <- list.files("../input")
if(any(filelist=="geocodes_df.rds")){
#read the created .rds containing the require data
geocodes_df <- readRDS("../input/geocodes_df.rds")
}else{
# using geocodes ( part of ggmap package) to find the lat and lon
# perhaps not the cleanest way, some of the location will not be the most accurate.
geocodes_df <- geocode(Country2ID_Map$countries)
saveRDS(geocodes_df, "../input/geocodes_df.rds")
}
# bind extracting coordinates into the Nodes, to be use as attributes for plotting later
CountryIDNodes<-cbind(Country2ID_Map,geocodes_df)
Given the number of Edges, and that we are probably more interested in links that are most significant, perhaps the edgeshould be filtered by weight before plotting into network graph
summary(Country_id_Edges$weight)
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# 1.0 2.0 5.0 283.9 21.0 173200.0
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.0 2.0 5.0 283.9 21.0 173200.0
It can be seen that the majority of the connections/edges between countries nodes are rather week, even the 3rd Quantile of the weight is only mere 21, while it’s mean is 283.9, Ie a very right skewed distribution with very very long tail.
This method of achieve interactive network plots roughly followed what shown at (minimaxir)[http://minimaxir.com/2016/12/interactive-network/]
Pros
Issue
Directionality in Edges is not shown in plotly - Thus, directional edges are shown only as lines between nodes.
No self loop/arrow can be showned - This is significant in this case, as self loop can be indicative of considerable amoung of illicit relationship/money movement within a country.
Failed get the geom edges’s weight to be depict as width/size after ggplotly(), the
To reduce the amount of Edges plotted, the weight can be thresholded to include only those > ~ Mean value. This threshold can also then be applied to remove countries nodes that did not shown consideration edges/relationship in the extracted datasets.
In this case, I assume that the relationship is direction, from “node_1” to “node_2”
net <- graph.data.frame(Country_id_Edges[weight>=285, ],
CountryIDNodes[id %in%
sort(unique(
c(
Country_id_Edges[weight>=285]$from,
Country_id_Edges[weight>=285]$to)
))],
directed = TRUE)
Some functions of igraph that would aid in interpretating network plots.
Centrality - igenvector centrality scores correspond to the values of the first eigenvector of the graph adjacency matrix; these scores may, in turn, be interpreted as arising from a reciprocal process in which the centrality of each actor is proportional to the sum of the centralities of those actors to whom he or she is connected.
Betweenness - The vertex and edge betweenness are (roughly) defined by the number of geodesics (shortest paths) going through a vertex or an edge.
Degree - The degree of a vertex is its most basic structural property, the number of its adjacent edges
#igraph, creating the graph entities while filtering for weight
Nodes_betweenness<- igraph::betweenness(net)
#### Nodes Enchancement
V(net)$degree <- igraph::degree(net, mode = "all")
V(net)$betweenness <-log(10+Nodes_betweenness)/log(1+max(Nodes_betweenness))
V(net)$centrality <- eigen_centrality(net, weights=E(net)$Weight)$vector
V(net)$community <- colorize(V(net)$community)
V(net)$text <- V(net)$countries
V(net)$color <- colorize(V(net)$degree)
#### Edge Enhancement
#Need to manually alocate the Edge lat,lon to appropriate coordinates
end_loc <- data.table(ename=as.integer(get.edgelist(net)[,2]))
end_loc<- CountryIDNodes[end_loc, on= c(id="ename"), nomatch= 0]
start_loc <- data.table(ename=as.integer(get.edgelist(net)[,1]))
start_loc<- CountryIDNodes[start_loc, on= c(id="ename"), nomatch= 0]
### Setting coordinates of edges arrow
E(net)$endlat <- end_loc$lat
E(net)$endlon <- end_loc$lon
E(net)$startlat <- start_loc$lat
E(net)$startlon <- start_loc$lon
### Scaling of weight
# applying a logarithm scale to recale the weight from 0 to 1
E(net)$weight<-log(1+E(net)$weight)/log(1+max(E(net)$weight))
Customised variables on plotly
Nodes Size - centrality, the centrality of each country is proportional to the sum of the other country nodes that it is connected with. This parameter would be influence by the weight of the edges connected and self looping edges too. (Larger -> more central to the network)
Nodes colour - Degree, how many connection one nodes has (lighter colour means that a particular country nodes has more connections) [counting both in and out connections]
The actual code chunks for plotting
df_net <- ggnetwork(net, layout = "kamadakawai", weights="weight")
# the ggnetwork essentially convert the igraph structure 'net' into a dataframe, which is more easy and famlier to work with, but this is also very limiting.
plot <- ggplot(arrow.gap = 0.025) +
borders("world",
colour ="black", fill="#7f7f7f", size=0.10, alpha=1/2)+
geom_edges(data = df_net,aes(x = lon, y = lat, xend = endlon, yend = endlat),
size = 0.4, alpha=0.25 , #size parameter in geom edge is not passed over correctly into ggplotly, it seems to be carry over to borders(country) in plotly too
arrow = arrow(length = unit(10, "pt"), type = "closed")) +
geom_nodes(data=df_net,aes(x=lon, y=lat, xend=endlon,yend=endlat,
size=centrality, colour=sqrt(degree), text=text)) +
scale_colour_viridis() +
ggtitle("Relationship of Countries with various nodes") +
## geom_map would provide a nicer map, but proved to be problematic for ggplotly
# geom_map(data=world, map=world, aes(x=long, y=lat, map_id=region),
# color="white", fill="#7f7f7f", size=0.05, alpha=1/4) +
guides(size=FALSE, color=FALSE) +
theme_blank()+
# https://github.com/ropensci/plotly/issues/842
theme(legend.position='none') #translate to hide legend in plotly
plot %>% ggplotly(tooltip="text")
#%>% toWebGL()
#issue, arrow head doesn't get translated into plotly via ggplotly
# no self loop is shown
Currently, toWebGL() is disable, as doesn’t appear to be very stable - the world map/countries borders are sometimes not shown when it is enable.
In the exposed network trails, it seems British Virgin Island is among the most connected country nodes, followed by Unknown (identities that did not list a country) and Bahama. Other major country nodes with higher degree(colour) and centrality(size) is Hong Kong, Singapore, UAE, Cyprus, Switzerland, not identify and UK.
The highest centrality and degreeness of the countrys can also be directly identify and ploted.
by_countries.raw<-setDT(df_net)[is.na(endlat),
.(countries, betweenness,centrality,degree)][order(-centrality)]%>%
.[,name:=as.character(countries)] %>%
head(30)
x<- c("Country", "Betweenness", "Centrality","Degree")
y<- c("{point.countries}", (sprintf("{point.%s:.2f}", c("betweenness", "centrality","degree"))))
tltip<- tooltip_table(x,y)
hchart(by_countries.raw, "scatter", hcaes(centrality, betweenness, size= degree, color=degree), dataLabels=list(enabled=T, format = '{point.countries}'))%>%
hc_title(text="Network Attributes of Country Nodes in panama-paradise papers")%>%
hc_tooltip(useHTML=TRUE, headerFormat="",pointFormat=tltip)
# hc_yAxis(type="logarithmic")%>%
# hc_xAxis(type="logarithmic")
#
# df_net <- ggnetwork(net, layout = "fruchtermanreingold", weights="weight", niter=50000, arrow.gap=0)
# # layout = "kamadakawai"
# # arrow.gap = 0.025 #
# # arrow gap default value for directed graph, but the arrows aren't carried over in plottly
# # niter - This argument controls the number of iterations to be employed. Larger values take longer, but will provide a more refined layout. (Defaults to 500.)
#
# plot <- ggplot() +
# geom_edges(data = df_net,aes(x = x, y = y, xend = xend, yend = yend),
# size=0.4, alpha=0.25) +
# geom_nodes(data = df_net,aes(x = x, y = y, xend = xend, yend = yend,
# size = degree, color = degree, text=text)) +
# ggtitle("Relationship of Countries with various nodes") +
# scale_colour_viridis() +
# ## geom_map would provide a nicer map, but proved to be problematic when chaining through ggplotly
# # geom_map(data=world, map=world, aes(x=long, y=lat, map_id=region),
# # color="white", fill="#7f7f7f", size=0.05, alpha=1/4) +
# # scale_color_manual(labels=c("EWR", "JFK", "LGA", "Others"),
# # values=c(colors, "#1a1a1a"), name="Airports") +
# guides(size=FALSE, color=FALSE) +
# theme_blank()+
# # https://github.com/ropensci/plotly/issues/842
# theme(legend.position='none') #translate to hide legend in plotly
#
#
#
# #raw plot
# plot
#
# #plotlly plot
# plot %>% ggplotly(tooltip="text")
Official method for R, what i done here roughly follows (plotly)[https://plot.ly/r/network-graphs/]. I modifying some additional scatter attributes as well as slightly improve the codes(in the way of R).
## Generating a layout for nodes on x,y using igraph
# L<-layout.circle(net) #deprecated
Layout_For_Nodes<-layout_(net, nicely())
CountryN_nodes<-data.table(as_data_frame(net, what = c("vertices")))
CountryN_edges<-data.table(as_data_frame(net, what = c("edges")))
#Combining the layout with extra attributes
Layout_For_Nodes<-data.frame(cbind(Layout_For_Nodes,CountryN_nodes))
setnames(Layout_For_Nodes, "V1", "x.coor") # data.table method to rename
setnames(Layout_For_Nodes, "V2", "y.coor") # data.table method to rename
# Create Nodes
network<- plot_ly(data=Layout_For_Nodes, type='scatter',x=~x.coor, y=~y.coor,
color=~centrality, size=~degree*10, mode="markers", text=~text, hoverinfo="text")
# Edges "to" and "from"
CountryN_edges$from<- as.integer(CountryN_edges$from)
CountryN_edges$to<- as.integer(CountryN_edges$to)
## Reindexing the Verices/nodes
CountryN_nodes$name<-as.integer(CountryN_nodes$name)
CountryN_nodes$IDN<-as.numeric(factor(CountryN_nodes$name))
# Merged/Mapped the IDN column into "to" and "from" column in edges.
CountryN_edges<-CountryN_nodes[,.(name,IDN)][CountryN_edges, on = c(name= "from")] %>%
CountryN_nodes[,.(name,IDN)][., on = c(name= "to")]
# Dropping unnecessary columns and renaming
CountryN_edges$name<-NULL
CountryN_edges$i.name<-NULL
colnames(CountryN_edges)[1]<- "from"
colnames(CountryN_edges)[2]<- "to"
# Creating Edges
edge_shapes<- list()
for(i in 1:nrow(CountryN_edges)){
v0<-CountryN_edges[i,]$from
v1<-CountryN_edges[i,]$to
edge_shape=list(
type="line",
line=list(color="#030303", width=0.3),
x0=Layout_For_Nodes$x.coor[[v0]],
y0=Layout_For_Nodes$y.coor[[v0]],
x1=Layout_For_Nodes$x.coor[[v1]],
y1=Layout_For_Nodes$y.coor[[v1]]
)
edge_shapes[[i]]<-edge_shape
}
axis<-list(title="", showgrid=FALSE, showticklabels=FALSE, zeroline=FALSE)
p<-layout(network,
title="Plotly Network",
shapes=edge_shapes,
xaxis=axis,
yaxis=axis
)
p
CountryN_edges$id <- seq_len(nrow(CountryN_edges))
#
# map projection
geo <- list(
# projection = list(type = 'azimuthal equal area'),
showland = TRUE,
landcolor = toRGB("gray95"),
countrycolor = toRGB("gray80")
)
plot_geo()%>%
add_markers(data= CountryN_nodes,
x = ~lon, y = ~lat, size = ~degree,
color = ~centrality, hoverinfo = "text", #, colorscale='Viridis'
text =~ ~paste(L$countries, "<br />",
"centrality: ", signif(L$centrality,2), "<br />",
"betweenness: ", signif(L$betweenness,2), , "<br />",
"degree: ", L$degree,, "<br />"))%>%
add_segments(
data = group_by(CountryN_edges,id),
x = ~startlon, xend = ~endlon,
y = ~startlat, yend = ~endlat,# width=~weight,
alpha = 0.3, size = I(1), hoverinfo = "none"
)%>%
layout(
title = 'Country Nodes Network on Map',
geo = geo, showlegend = FALSE
)
## Passing layout here doesn't seem to work
# layout(network,
# title="Plotly Network",
# shapes=edge_shapes,
# xaxis=axis,
# yaxis=axis
# )
Pros
Issue
Documentation is sparse - I based mine on its R equivalent (website)[http://jkunst.com/highcharter/hchart.html], but I could not find any igraph related stuff on highcharter API (where it is origin and generally better documented, although in Java).
No directional arrow - Couldn’t get it to display directional arrow in the network plots
No obvious method to fix the nodes at specific location.
Plotting variable
Nodes colour - membership, which membership of country nodes in the network densely connected subgraphs, also called communities in a graph via random walks.
Nodes Size - Degree, how many connection one nodes has (lighter colour means that a particular country nodes has more connections) [counting both in and out connections]
#igraph, creating the graph entities while filtering for weight
Nodes_betweenness<- igraph::betweenness(net)
membership<- membership(cluster_walktrap(net))
#### Nodes Enchancement
V(net)$degree<-igraph::degree(net, mode = "all")
V(net)$betweenness<-log(10+Nodes_betweenness)/log(1+max(Nodes_betweenness))
V(net)$centrality<-eigen_centrality(net, weights=E(net)$Weight)$vector
V(net)$text<-V(net)$countries
V(net)$color<-colorize(membership)
V(net)$size<-V(net)$degree
#### Edge Enhancement
#Need to manually alocate the Edge lat,lon to appropriate coordinates
end_loc<-data.table(ename=as.integer(get.edgelist(net)[,2])) %>%
.[CountryIDNodes, on= c(ename="id"), nomatch= 0]
### Setting coordinates of edges arrow
E(net)$endlat<-end_loc$lat
E(net)$endlon<-end_loc$lon
### Scaling of weight
# applying a logarithm scale to recale the weight from 0 to 1
E(net)$weight<-log(1+E(net)$weight)/log(1+max(E(net)$weight))
E(net)$width<-E(net)$weight*3
#Doesn't appearst to be working
# E(net)$arrow.size<- 12
hchart(net, layout=layout_with_kk)%>%
hc_title(text="Network Attributes of Country Nodes in panama-paradise papers")
### couldn't get the nodes to be fix on respective coordinate of the countries.
# hchart(net, layout=as.matrix(geocodes_df))
# Error in UseMethod("layout") : no applicable method for 'layout' applied to an object of class "igraph"
Pros
Well documented on their Website
Both Igraph layout, fix X and Physics based graph can be generated
Appears to be running on Javascript/html
Issues
No direct method to overlay the nodes onto a map
While it is possible to force them into respective coordintes(x, y) and disable Physics(so they remain statis at the location), to properly display them would require some work on scaling. There is also no direct method to overlay the nodes on a world map (from what i can discern from google and documentation)
#thresholding
vis_edge<-Country_id_Edges[weight>=285,]
vis_node<-CountryIDNodes[id %in% sort(unique(
c(
Country_id_Edges[weight>=285]$from,
Country_id_Edges[weight>=285]$to)
))]
# using igraph to calculate some betweenness and degree
net<-graph.data.frame(vis_edge, vis_node, directed = TRUE)
Nodes_betweenness<-igraph::betweenness(net) # Node size
Nodes_Degree<-igraph::degree(net, mode = "all")
## Enchancement
## ?visNodes
vis_node$shape <- "dot"
vis_node$shadow <- TRUE # Nodes will drop shadow
vis_node$label <-vis_node$countries
vis_node$title <- vis_node$countries
vis_node$size <- log(10+Nodes_betweenness)/log(1+max(Nodes_betweenness))* 25 #default to 25
vis_node$borderWidth <- 2 # Node border width
vis_node$color.background <- colorize(Nodes_Degree)
vis_node$color.border <- "black"
vis_node$color.highlight.background <- "orange"
vis_node$color.highlight.border <- "darkred"
## Defining starting position of nodes as coordinates of the countries, so that their location of on graph would bear some semblance to their respective location on the map ( ie, Australia is down south etc)
vis_node$x<- vis_node$lon+180
vis_node$y<- -vis_node$lat+90
## Physics can be disable so the nodes would not be moved from the initial location (lat/lon), this is not used as it generated a plot that is rather hard to read.
# vis_node$physics<- F
# vis_edge$physics<- T
# ?visEdges
vis_edge$shadow <- FALSE # edge shadow
vis_edge$width <-log(1+vis_edge$weight)/log(1+max(vis_edge$weight)) # default to 1
vis_edge$arrows <- "middle" # arrows: 'from', 'to', or 'middle'
set.seed(1)
visNetwork(edges=vis_edge, nodes=vis_node, main="Aggregated Network plot of Countries Nodes Network")%>%
visOptions(highlightNearest = TRUE)
## While the Initial zoom level can be setup, this require either to disable visPhysics's Stabilization or the use of visIgraphLayout, which would sacrifice the the cleanliness of the plot
## Choosing to true off stabilization option in physics would hence require the stabilization iteration to be plotted, aesthetically and physically impressive but not useful
# visEvents(type = "once", startStabilizing = "function() {
# this.moveTo({scale:0.5})}") %>%
# visPhysics(stabilization = FALSE)%>%
# %>% visIgraphLayout()
## While it yield a ok map with the Igraph Layout, it is relatively messy as the nodes and edges can be in close proximity with one another.
You will have to scroll your mouse3 to zoom towards the network plots, unfortunately setting initiall zoom level brought about some undesirable side effects, at least for the methods i tried.
Country coordinates(lat, lon) of respective nodes are used as starting location of the network plot. Hence, the final location of nodes ( countries) should bear some resemblance to their respective location on the world map
This network plot methods works pretty well though, asside from requiring edges to be index at 0. This is further complicated by the fact that thresholded nodes_id aren’t even continuous.
vis_edge<-vis_edge[order(from, to)]
el <- data.frame(from=vis_edge$from,
to=vis_edge$to,
value = vis_edge$width)
# http://www.r-graph-gallery.com/253-custom-network-chart-networkd3/
## Suggested method of reindexing the id, probably only works if your id is continously
# vis_node$id=as.numeric(as.factor(vis_node$id))-1
## Reindexing the nodes as d3 network/javascript are zero index
#Create a zero index column IDN
vis_node$IDN=as.numeric(factor(vis_node$id))-1
# Merged/Mapped the IDN column into "to" and "from" column in edges.
vis_edge_d3<-vis_node[,.(id,IDN)][vis_edge, on = c(id= "from")] %>%
vis_node[,.(id,IDN)][., on = c(id= "to")]
# Dropping unnecessary columns and renaming
vis_edge_d3$id<-NULL
vis_edge_d3$i.id<-NULL
colnames(vis_edge_d3)[1]<- "from"
colnames(vis_edge_d3)[2]<- "to"
# forceNetwork(Links = vis_edge_d3, Nodes = vis_node,
# # plotting parameters
# Source="from", Target="to", Value = "width",
# Group = "color.background", NodeID="countries",
# # Nodesize=6,
# opacity = 0.8,
# opacityNoHover = 0.4,
# radiusCalculation = JS(" d.nodesize^2+10"),
# linkColour = "#afafaf",
# linkWidth = JS("function(d) { return Math.sqrt(d.value); }"),
#
# # layout
# charge = -250, # if highly negative, more space betqeen nodes
#
# # general parameters
# arrows=TRUE,
# fontSize=17,
# zoom = TRUE,
# legend=F,
# width = NULL,
# height = NULL
# )
Loading the rendered plot of d3 network on knited rmd/html on chrome seems to be causing absurd amount of error and thus crashing it. It, however, appears to be running fine on Edge explorer.
## Nodes
#merging them all on node_id doesn't seems to result in a very useful plot, the codes for merging that I tried running can be found in the appendices section.
# Combining various identities and label them
Nodes<-rbind(
Entities[,.(node_id,countries, country_codes, nameID=name, sourceID, Identity="Entities")],
Intermediaries[,.(node_id,countries, country_codes, nameID=name, sourceID, Identity="Intermediaries")],
Officers[,.(node_id,countries, country_codes, nameID=name, sourceID, Identity="Officers")],
Addresses[,.(node_id, countries, country_codes, sourceID, Identity="Addresses")]
, fill=TRUE)
#I initially thought that address wouldn't be needed in to full network diagram, but later found out that if I exlude the addresses datasets, I couldn't form a network graph some of the nodes require connection to the node_id that can only be found in address datasets.
Removing duplicate listing of some nodes which share id but exist is different identitites.
# These combined dataframe of nodes is not directly network graphable. As the node_id is not unique, ie. Below we explore these non unique node_id records.
Non_unique_ID <-Nodes[, fD := .N > 1, by = node_id][fD==TRUE] %>%
.[order(node_id)]
Non_unique_ID %>%head(30)
# So, apparently some ID have entires for both Intermediaries and Officers, which probably a simply row_bind to combine them, as in these case m the node_id would not be unique.
# after some testing, it appears that such issue only occurs between intermediaries and officers.
# Dropping the officers row if the node_id is already occupied by an intermediate.
Nodes<-Nodes[!(fD==TRUE & Identity=="Officers")]
# Dropping the fD column as it is no longer needed.
Nodes$fD <- NULL
Simpilfying the Edges to Make it possible to visuallise it with line type
# While I intend to use different arrows type for the disply of Edges witin the network plot, there are simply far too many relationship types as indicated by the rel_type column in Edges. Although the majority of the relationship are well covered by the top 30 types
# Hence, I will simplify it by defining 3 type of edges,
# 1) Identical relationship (only within top 30 types)#same name as
# 2) Directional relationship (only within top 30 types) #intermediary of/shareholder of/director of
# 3) Others (those not inlcuded in top 30 most popular relationship)
popular_rel_type<-Edges[,.N, by=rel_type] %>%
.[order(-N)] %>%
head(30)
# within the top 30 most common relationship
identical_relation_list <- c("similar name and address as",
"same name as",
"same company as",
"same name and registration date as",
"same address as")
Edges[rel_type %in% popular_rel_type$rel_type, Edge_Type:=1]%>%
.[!(rel_type %in% popular_rel_type$rel_type), Edge_Type:=2]%>%
.[rel_type %in%identical_relation_list, Edge_Type :=3]
##Edges
Edges_simplified<-Edges[,.(node_1, node_2, rel_type, Edge_Type, sourceID)]
colnames(Edges_simplified) <-c("from", "to", "rel_type", "edge_type", "sourceID")
## Setting network graph into directed to examine the all connections and out connections of nodes
net <- graph.data.frame(Edges_simplified, vertices=Nodes, directed = T)
### Degree, the connections of edges
# nodes_degree_all <- igraph::degree(net, mode = "all")
# nodes_degree_out <- igraph::degree(net, mode = "out")
# The degree of a vertex is its most basic structural property, the number of its adjacent edges.
### Betweenness, number of shortest path going through vertext,
### It doesn't seems sensible to examine the network plot with this
# nodes_betweenness<- igraph::betweenness(net)
## The vertex and edge betweenness are (roughly) defined by the number of geodesics (shortest paths) going through a vertex or an edge.
# nodes_centrality <- eigen_centrality(net)
## Eigenvector centrality scores correspond to the values of the first eigenvector of the graph adjacency matrix; these scores may, in turn, be interpreted as arising from a reciprocal process in which the centrality of each actor is proportional to the sum of the centralities of those actors to whom he or she is connected.
## allocating the calculated nodes attributes into a dataframe
nodes_attributes<-data.table(
nodes_id=names(igraph::degree(net, mode = "all")),
nodes_degree_all=(igraph::degree(net, mode = "all")),
nodes_degree_out=(igraph::degree(net, mode = "out")),
nodes_betweenness=(igraph::betweenness(net)),
centrality=(eigen_centrality(net)$vector))
decomposed_graph_list<-decompose.graph(net)
# this return a list of seperate graph for each component
# plot(decomposed_graph_list[[231]])
##Calculation the number of members per decomposed graph and set it as a dataframe.
vcount_dt<-data.table(unlist(lapply(decomposed_graph_list,vcount)),keep.rownames=T)
vcount_dt$membership_id<-rownames(vcount_dt)
setnames(vcount_dt, "V1", "vcount")
vcount_dt[order(-vcount)]%>%head(10)
## Choosing clusters of different size to plot
#large id=991, N=406
#medium id=185, N=166
#small id=5050, N=16
# plot(decomposed_graph_list[[1]])
A total of 10,633 individual none connected clusters, although the majoirty of the nodes have already been linked in community1, which has the majortiy of the nodes, numbered at 942172 members/nodes.
subnodes_large<-data.table(as_data_frame(decomposed_graph_list[[991]], what = c("vertices")))
subedges_large<-data.table(as_data_frame(decomposed_graph_list[[991]], what = c("edges")))
subnodes_medium<-as_data_frame(decomposed_graph_list[[185]], what = c("vertices"))
subedges_medium<-as_data_frame(decomposed_graph_list[[185]], what = c("edges"))
subnodes_small<-as_data_frame(decomposed_graph_list[[5050]], what = c("vertices"))
subedges_small<-as_data_frame(decomposed_graph_list[[5050]], what = c("edges"))
# Exploring centrality
High_Centrality_Nodes<-nodes_attributes[centrality>=0.002681][order(-centrality)]%>%head(30)
# Changing the class of nodes_id to interget for Merging
High_Centrality_Nodes$node_id<- as.integer(High_Centrality_Nodes$nodes_id)
# Merging with original nodes to acquire nodes attributes
H_Centrality_dt<-Nodes[High_Centrality_Nodes, on=c(node_id="node_id")]
##Highcharter
#tooltip table
x<- c("NodeName", "Degree all","Degree out","Betweenness", "Centrality")
y<- c("{point.nameID}", (sprintf("{point.%s:.2f}",
c("nodes_degree_all", "nodes_degree_out",
"nodes_betweenness", "centrality"))))
tltip<- tooltip_table(x,y)
#plot
hchart(H_Centrality_dt, "scatter", hcaes(centrality, nodes_betweenness,
size= nodes_degree_all, color=nodes_degree_out), dataLabels=list(enabled=T, format = '{point.nameID}'))%>%
hc_title(text="Network Attributes of Nodes in panama-paradise papers")%>%
hc_tooltip(useHTML=TRUE, headerFormat="",pointFormat=tltip)%>%
hc_yAxis(type="logarithmic")%>%
hc_xAxis(type="logarithmic")
#Exploring degrees
# nodes with most in connections
Most_in_connected<-nodes_attributes[order(nodes_degree_out, -nodes_degree_all)]%>%head(30)
Nodes[node_id %in% Most_in_connected$nodes_id]
# nodes with most outgoing connections
Most_out_connected<-nodes_attributes[order(-nodes_degree_out, nodes_degree_all)]%>%head(30)
Nodes[node_id %in% Most_out_connected$nodes_id]
# using igraph to calculate some betweenness and degree
subnet_large<-graph.data.frame(subedges_large, subnodes_large, directed = TRUE)
Nodes_betweenness<-igraph::betweenness(subnet_large) # Node size
Nodes_Degree<-igraph::degree(subnet_large, mode = "all")
# Enchancement
# ?visNodes
subnodes_large$id<- subnodes_large$name
subnodes_large$shadow <- TRUE # Nodes will drop shadow
subnodes_large$size <- log(10+Nodes_betweenness)/log(1+max(Nodes_betweenness))* 25 #default to 25
subnodes_large$borderWidth <- 2 # Node border width
subnodes_large$color.background <- colorize(Nodes_Degree)
subnodes_large$color.border <- "black"
subnodes_large$color.highlight.background <- "orange"
subnodes_large$color.highlight.border <- "darkred"
subnodes_large$shape <- factor(subnodes_large$Identity,
levels=c("Entities","Intermediaries","Officers","Addresses"),
labels=c("dot","triangle","square","diamond"))
subnodes_large$label <-subnodes_large$nameID
subnodes_large$title <- paste0("<p>",subnodes_large$nameID,"<br>",subnodes_large$countries,"</p>")
# ?visEdges
subedges_large$shadow <- FALSE # edge shadow
subedges_large$arrows <- "middle" # arrows: 'from', 'to', or 'middle'
subedges_large$dashes <- (subedges_large$edge_type==3)
subedges_large$label<- subedges_large$rel_type
set.seed(1)
visNetwork(edges=subedges_large, nodes=subnodes_large, main="Extracted Large Cluster") %>%
# height="400px", width="100%") %>%
visIgraphLayout() %>%
visOptions(highlightNearest = list(enabled=T, degree=1, hover=F)) %>%
visNodes(scaling = list(min = 10, max = 50))
# using igraph to calculate some betweenness and degree
subnet_medium<-graph.data.frame(subedges_medium, subnodes_medium, directed = TRUE)
Nodes_betweenness<-igraph::betweenness(subnet_medium) # Node size
Nodes_Degree<-igraph::degree(subnet_medium, mode = "all")
# Enchancement
# ?visNodes
subnodes_medium$id<- subnodes_medium$name
subnodes_medium$shadow <- TRUE # Nodes will drop shadow
subnodes_medium$size <- log(10+Nodes_betweenness)/log(1+max(Nodes_betweenness))* 10 #default to 25
subnodes_medium$borderWidth <- 2 # Node border width
subnodes_medium$color.background <- colorize(Nodes_Degree)
subnodes_medium$color.border <- "black"
subnodes_medium$color.highlight.background <- "orange"
subnodes_medium$color.highlight.border <- "darkred"
subnodes_medium$shape <- factor(subnodes_medium$Identity,
levels=c("Entities","Intermediaries","Officers","Addresses"),
labels=c("dot","triangle","square","diamond"))
subnodes_medium$label <-subnodes_medium$nameID
subnodes_medium$title <- paste0("<p>",subnodes_medium$nameID,"<br>",subnodes_medium$countries,"</p>")
# ?visEdges
subedges_medium$shadow <- FALSE # edge shadow
subedges_medium$arrows <- "middle" # arrows: 'from', 'to', or 'middle'
subedges_medium$dashes <- (subedges_medium$edge_type==3)
subedges_medium$label<- subedges_medium$rel_type
set.seed(1)
visNetwork(edges=subedges_medium, nodes=subnodes_medium, main="Extracted Medium Cluster") %>%
# visIgraphLayout() %>%
visOptions(highlightNearest = TRUE)
# using igraph to calculate some betweenness and degree
subnet_small<-graph.data.frame(subedges_small, subnodes_small, directed = TRUE)
Nodes_betweenness<-igraph::betweenness(subnet_small) # Node size
Nodes_Degree<-igraph::degree(subnet_small, mode = "all")
# Enchancement
# ?visNodes
subnodes_small$id<- subnodes_small$name
subnodes_small$shadow <- TRUE # Nodes will drop shadow
subnodes_small$label <-subnodes_small$countries
subnodes_small$title <- subnodes_small$nameID
subnodes_small$size <- log(10+Nodes_betweenness)/log(1+max(Nodes_betweenness))* 10 #default to 25
subnodes_small$borderWidth <- 2 # Node border width
subnodes_small$color.background <- colorize(Nodes_Degree)
subnodes_small$color.border <- "black"
subnodes_small$color.highlight.background <- "orange"
subnodes_small$color.highlight.border <- "darkred"
subnodes_small$shape <- factor(subnodes_small$Identity,
levels=c("Entities","Intermediaries","Officers","Addresses"),
labels=c("dot","triangle","square","diamond"))
subnodes_small$label <-subnodes_small$nameID
subnodes_small$title <- paste0("<p>",subnodes_small$nameID,"<br>",subnodes_small$countries,"</p>")
# ?visEdges
subedges_small$shadow <- FALSE # edge shadow
subedges_small$arrows <- "middle" # arrows: 'from', 'to', or 'middle'
subedges_small$dashes <- (subedges_small$edge_type==3)
subedges_small$label<- subedges_small$rel_type
set.seed(1)
visNetwork(edges=subedges_small, nodes=subnodes_small, main="Extracted Small Cluster") %>%
# visIgraphLayout() %>%
visOptions(highlightNearest = TRUE)
Dump/Achieved code chunks than I tried and failed to achieve adequate result.
This is achieved version my previous attempt to merge(only with node_id) all the datasets while preserving the individual columns by renaming them before merging.
Ultimately, this seems to yield a very unweidly data.frame/data.table (very sparse). Turns out most of the node_id only have one identiy(either entities/intermediaries/officers), There are some node_id with inputs for intermediaries and officers but it is rather rare.
rbind method that i used in my main routine works better in this case.
########
#
# ## rename the individual datasets, to combine them via node_id
# colnames(Entities) <- paste("Ent", colnames(Entities), sep = ".")
# colnames(Intermediaries) <- paste("Int", colnames(Intermediaries), sep = ".")
# colnames(Officers) <- paste("Off", colnames(Officers), sep = ".")
#
# ## Merging all the inputs together.
# testx<-Intermediaries[Entities, on= c(Int.node_id="Ent.node_id" )]%>%
# .[Officers, on = c(Int.node_id="Off.node_id")]
#
#
########
net <- graph.data.frame(Country_id_Edges[weight>=285, ],
CountryIDNodes[id %in%
sort(unique(
c(
Country_id_Edges[weight>=285]$from,
Country_id_Edges[weight>=285]$to)
))],
directed = TRUE)
#igraph, creating the graph entities while filtering for weight
Nodes_betweenness<- igraph::betweenness(net)
#### Nodes Enchancement
V(net)$degree <- igraph::degree(net, mode = "all")
V(net)$betweenness <-log(10+Nodes_betweenness)/log(1+max(Nodes_betweenness))
V(net)$centrality <- eigen_centrality(net, weights=E(net)$Weight)$vector
V(net)$community <- colorize(V(net)$community)
V(net)$text <- V(net)$countries
V(net)$color <- colorize(V(net)$degree)
#### Edge Enhancement
#Need to manually alocate the Edge lat,lon to appropriate coordinates
end_loc <- data.table(ename=as.integer(get.edgelist(net)[,2]))
end_loc<- CountryIDNodes[end_loc, on= c(id="ename"), nomatch= 0]
start_loc <- data.table(ename=as.integer(get.edgelist(net)[,1]))
start_loc<- CountryIDNodes[start_loc, on= c(id="ename"), nomatch= 0]
### Setting coordinates of edges arrow
E(net)$endlat <- end_loc$lat
E(net)$endlon <- end_loc$lon
E(net)$startlat <- start_loc$lat
E(net)$startlon <- start_loc$lon
### Scaling of weight
# applying a logarithm scale to recale the weight from 0 to 1
E(net)$weight<-log(1+E(net)$weight)/log(1+max(E(net)$weight))
Official method for R, what i done here roughly follows (plotly)[https://plot.ly/r/network-graphs/]. I modifying some additional scatter attributes as well as slightly improve the codes(in the way of R).
## Generating a layout for nodes on x,y using igraph
# L<-layout.circle(net) #deprecated
Layout_For_Nodes<-layout_(net, nicely())
CountryN_nodes<-data.table(as_data_frame(net, what = c("vertices")))
CountryN_edges<-data.table(as_data_frame(net, what = c("edges")))
#Combining the layout with extra attributes
Layout_For_Nodes<-data.frame(cbind(Layout_For_Nodes,CountryN_nodes))
setnames(Layout_For_Nodes, "V1", "x.coor") # data.table method to rename
setnames(Layout_For_Nodes, "V2", "y.coor") # data.table method to rename
# Create Nodes
network<- plot_ly(data=Layout_For_Nodes, type='scatter',x=~x.coor, y=~y.coor,
color=~centrality, size=~degree*10, mode="markers", text=~text, hoverinfo="text")
# Edges "to" and "from"
CountryN_edges$from<- as.integer(CountryN_edges$from)
CountryN_edges$to<- as.integer(CountryN_edges$to)
## Reindexing the Verices/nodes
CountryN_nodes$name<-as.integer(CountryN_nodes$name)
CountryN_nodes$IDN<-as.numeric(factor(CountryN_nodes$name))
# Merged/Mapped the IDN column into "to" and "from" column in edges.
CountryN_edges<-CountryN_nodes[,.(name,IDN)][CountryN_edges, on = c(name= "from")] %>%
CountryN_nodes[,.(name,IDN)][., on = c(name= "to")]
# Dropping unnecessary columns and renaming
CountryN_edges$name<-NULL
CountryN_edges$i.name<-NULL
colnames(CountryN_edges)[1]<- "from"
colnames(CountryN_edges)[2]<- "to"
# Creating Edges
edge_shapes<- list()
for(i in 1:nrow(CountryN_edges)){
v0<-CountryN_edges[i,]$from
v1<-CountryN_edges[i,]$to
edge_shape=list(
type="line",
line=list(color="#030303", width=0.3),
x0=Layout_For_Nodes$x.coor[[v0]],
y0=Layout_For_Nodes$y.coor[[v0]],
x1=Layout_For_Nodes$x.coor[[v1]],
y1=Layout_For_Nodes$y.coor[[v1]]
)
edge_shapes[[i]]<-edge_shape
}
axis<-list(title="", showgrid=FALSE, showticklabels=FALSE, zeroline=FALSE)
p<-layout(network,
title="Plotly Network",
shapes=edge_shapes,
xaxis=axis,
yaxis=axis
)
p
CountryN_edges$id <- seq_len(nrow(CountryN_edges))
#
# map projection
geo <- list(
# projection = list(type = 'azimuthal equal area'),
showland = TRUE,
landcolor = toRGB("gray95"),
countrycolor = toRGB("gray80")
)
plot_geo()%>%
add_markers(data= CountryN_nodes,
x = ~lon, y = ~lat, size = ~degree,
color = ~centrality, hoverinfo = "text", #, colorscale='Viridis'
text =~ ~paste(L$countries, "<br />",
"centrality: ", signif(L$centrality,2), "<br />",
"betweenness: ", signif(L$betweenness,2), , "<br />",
"degree: ", L$degree,, "<br />"))%>%
add_segments(
data = group_by(CountryN_edges,id),
x = ~startlon, xend = ~endlon,
y = ~startlat, yend = ~endlat,# width=~weight,
alpha = 0.3, size = I(1), hoverinfo = "none"
)%>%
layout(
title = 'Country Nodes Network on Map',
geo = geo, showlegend = FALSE
)
## Passing layout here doesn't seem to work
# layout(network,
# title="Plotly Network",
# shapes=edge_shapes,
# xaxis=axis,
# yaxis=axis
# )
Issues
I couldn’t identify a path to pass the tooltips text into plotlys for interactively
toWebGL() proved to be problematic.
V(net)$community<- igraph::cluster_walktrap(net)$membership
ggg<-ggnet2(net, node.size = sqrt(V(net)$degree)*6,
node.color = colorize(V(net)$community), node.label = V(net)$text,
edge.size = E(net)$weight, edge.color = "grey", label.size=2,
alpha = 0.5, mode = "kamadakawai") +
theme_blank()+
# https://github.com/ropensci/plotly/issues/842
theme(legend.position='none') #translate to hide legend in plotly
# mode = "kamadakawai"
# issue with tool tip
# I could not get the tool tip working in this form
ggg
df_net <- ggnetwork(net, layout = "kamadakawai")
# possible nice layout: kamadakawai, fruchtermanreingold
plot <- ggplot(df_net, aes(x = x, y = y, xend = xend, yend = yend), arrow.gap = 0.025) +
geom_edges(alpha = 0.25, arrow = arrow(length = unit(0.5, "lines"), type = "closed")) +
geom_nodes(aes(size = degree, color = betweenness, text=text)) +
ggtitle("Network Graph of Papers flows between Countries") +
theme_blank()
plot %>% ggplotly(tooltip = "text") %>% toWebGL()
# currenct issue, coulnd't get weight into the ggplot nicely
### Degree, the connections of edges
subnodes_attributes<-data.table(
nodes_id = as.integer(names(igraph::degree(decomposed_graph_list[[1]], mode = "all"))),
nodes_degree_all = igraph::degree(decomposed_graph_list[[1]], mode = "all"),
nodes_degree_out = igraph::degree(decomposed_graph_list[[1]], mode = "out"),
nodes_betweenness= igraph::betweenness(decomposed_graph_list[[1]]),
nodes_centrality = eigen_centrality(decomposed_graph_list[[1]])$vector)
# Exploring centrality
H_Degree_dt<-subnodes_attributes[order(-nodes_degree_all)] %>%
Nodes[., on=c(node_id="nodes_id")]%>%
.[!(Identity=="Addresses"),]%>%
head(10)
##Highcharter
#tooltip table
x<- c("NodeName", "Degree all","Degree out","Betweenness", "Centrality")
y<- c("{point.nameID}", (sprintf("{point.%s:.2f}",
c("nodes_degree_all", "nodes_degree_out",
"nodes_betweenness", "nodes_centrality"))))
tltip<- tooltip_table(x,y)
#plot
hchart(H_Degree_dt, "scatter", hcaes(nodes_degree_all, nodes_degree_out,
size= nodes_centrality, color=nodes_betweenness), dataLabels=list(enabled=T, format = '{point.nameID}'))%>%
hc_title(text="Network Attributes of Nodes in panama-paradise papers")%>%
hc_tooltip(useHTML=TRUE, headerFormat="",pointFormat=tltip)
# Exploring centrality
H_Centrality_dt<-subnodes_attributes[order(-nodes_centrality)] %>%
Nodes[., on=c(node_id="nodes_id")]%>%
.[!(Identity=="Addresses"),]%>%
head(15)
#plot
hchart(H_Centrality_dt, "scatter", hcaes(nodes_centrality, nodes_betweenness,
size= nodes_degree_all, color=nodes_degree_out), dataLabels=list(enabled=T, format = '{point.nameID}'))%>%
hc_title(text="Network Attributes of Nodes in panama-paradise papers")%>%
hc_tooltip(useHTML=TRUE, headerFormat="",pointFormat=tltip)%>%
hc_xAxis(type = "logarithmic")%>%
hc_yAxis(type = "logarithmic")
disabled
# subnodes_Exlarge<-data.table(as_data_frame(decomposed_graph_list[[1]], what = c("vertices")))
# subedges_Exlarge<-data.table(as_data_frame(decomposed_graph_list[[1]], what = c("edges")))
# subnet_Elarge_net<-graph.data.frame(subedges_Exlarge, subnodes_Exlarge, directed = F)
#
# # subnodes_cluster_lec<-cluster_leading_eigen(subnet_Elarge_net)
# set.seed(1)
# subnodes_cluster_prop<-cluster_label_prop(subnet_Elarge_net)
#
# # print(subnodes_cluster_prop)
# # modularity(subnodes_cluster_prop)
# # length(subnodes_cluster_prop)
# # # membership(subnodes_cluster_prop)
# # sizes(subnodes_cluster_prop)
#
#
# ## Extracting the nodes and its communities by membership in a cluster
# ex<-induced.subgraph(subnet_Elarge_net, which(membership(subnodes_cluster_prop)==35))